Real-time changepoint detection in a nonlinear expectile model

An online changepoint detection procedure based on conditional expectiles is introduced. The key contribution is threefold: nonlinearity of the underlying model improves the overall flexibility while a parametric form of the unknown regression function preserves a simple and straightforward interpretation; The conditional expectiles, well-known in econometrics for being the only coherent and elicitable risk measure, introduce additional robustness—especially with respect to asymmetric error distributions common in various types of data; The proposed statistical test is proved to be consistent and the distribution under the null hypothesis does not depend on the functional form of the underlying model nor the unknown parameters. Empirical properties of the proposed real-time changepoint detection test are investigated in a simulation study and a practical applicability is illustrated using the Covid-19 prevalence data from Prague.


Introduction
It is a common task-not only in statistics-to provide procedures for detecting and estimating changepoints in all kinds of mathematical and stochastic models. Such pro- cedures are also important from a practical point of view and they may be often crucial in many real life problems. For instance, detecting a changepoint in some data generating model may trigger some model retraining mechanisms or, more frequently, it may govern important decisions effecting specific subjects or even the whole populationsuch as different pandemic restrictions related to the recent Covid-19 infection spread. On the other hand, the estimation of changepoints may lead to some correction procedures, specific treatment implementations, additional target-specific decisions, or just a deeper understanding of the underlying data generating process.
Considering the basic stochastic principles of the changepoint detection and various estimation methods, two different approaches are usually adopted in practical implementations. If the whole data sample is available at the very beginning of the analysis, the detection algorithm is called an offline procedure. If the data arrive in time (usually in an observation-by-observation manner) and the changepoint detection algorithm runs concurrently as new observations appear, such algorithms are referred to as online procedures.
In this paper, we focus on the online regime, where the proposed changepoint detection algorithm will be applied for a nonlinear parametric regression model. In addition to this nonlinearity, the conditional expectile estimation of the unknown parameters is adopted-similarly as in Newey and Powell (1987) where, however, the authors investigated a simple linear model instead-to have a coherent risk measure while also accounting for possibly asymmetric random error distributions. The changepoint detection itself is performed in terms of a consistent statistical test which is based on an accumulating dataset used in each consecutive step of the proposed online procedure.
There is a vast literature available on both-the offline and online changepoint detection strategies considering different models and various technical assumptions. Bearing in mind just the online procedures, Nedényi (2018) proposed an online testing approach based on a CUSUM test statistic to detect changes in a parameter of a discretetime stochastic process. Linear regression models with independent error terms are considered in Chu et al (1996) and Horváth et al (2004), where a standard least squares estimator is employed. Possible detection delays in a sequential changepoint test for a multiple linear regression model are discussed in Aue et al (2009). Linear regression models with dependent observations are investigated in Fremdt (2015) and the online changepoint detection procedures within autoregressive times series are studied, for instance, in Hušková et al (2007). Some generalizations for multivariate cases can be found in Aue et al (2009) or Hoga (2017) and their results are further generalized in Barassi et al (2020) where a semiparametric CUSUM test is proposed to perform the online changepoint detection for various correlation structures of nonlinear multivariate regression models with dynamically evolving volatilities. Nonlinear integer-valued times series are also discussed from this perspective in Lee and Lee (2019). A very nice overall review of the online procedures can be also found in Basseville and Nikiforov (1993).
The method presented in this paper advocates the idea of semi-parametric CUSUM approaches in a combination with some robustness with respect to the underlying error terms. Firstly, a nonlinear regression model is assumed to govern the data generating process. Although the underlying regression function is deterministic, it is allowed to be nonlinear with respect to a set of unknown parameters. This introduces a rela-tively flexible class of possible functions. Second, despite the independent error terms assumed for the proposed online detection regime, there are no restrictive assumptions imposed on the underlying error distribution and, in particular, substantial robustness is achieved with the proposed expectile estimation that also allows for asymmetric and heavy-tailed error distributions. The conditional expectiles define the only coherent and elicitable risk measure (see, for instance, Bellini et al (2018) or Ziegel (2016)) which is particularly important in situations where some risk related assessment is needed. Moreover, despite many similarities with conditional quantiles, the conditional expectiles are well-known to be viable also in situations when the conditional quantiles fail (see Philipps (2022) for a more comprehensive comparison). Third, the proposed test statistic follows, under the null hypothesis of no change, a relatively simple distribution which does not depend on the underlying regression function nor the set of the unknown parameters. Finally, the whole procedure can be implemented in a straightforward way and all necessary calculations performed within the proposed online regime can be easily obtained. Thus, the presented real-time changepoint detection method has a great potential for a practical applicability which goes way beyond the Covid-19 example illustrated at the end.
The rest of the paper is structured as follows: The underlying data and the corresponding changepoint model are described in the next section. A real-time changepoint detection in terms of a formal statistical test is introduced in Sect. 3. The asymptotic properties of the proposed test are also detailed there. In Sect. 4, finite sample properties are investigated and the Covid-19 prevalence data from Prague, Czech Republic, are analysed using the proposed methodological framework. Section 5 concludes with some final remarks. All theoretical proofs and further technical details are postponed to the Appendix.

Asymmetric least squares with changepoint
Let us consider a set of historical data denoted as {(Y i , X i ) ; i = 1, . . . , m} for some deterministic q-dimensional vector of explanatory variables X i = (X i1 , . . . , X iq ) and some integer m ∈ N. The data are assumed to follow a general nonlinear parametric regression model where f (·, β) is an explicit function depending on some unknown vector parameter β = (β 1 , . . . , β p ) ∈ ⊆ R p with the true (unknown) value denoted as β 0 ∈ R p . A different approach could consider X i 's as random vectors, however, we concentrate on the fixed design as we want to adopt a robust (i.e., distribution-free) approach with only minimal assumptions being imposed on the underlying data distribution. Nevertheless, with respect to the forthcoming theory, analogous results for the random design can be derived as well (all under some technical assumptions needed for the deterministic convergences to become convergences in probability). After the historical data are observed, another T m ∈ N observations are measured instantly for both-the response variable Y i and the explanatory vector X i ∈ Υ ⊆ R q , both for i = m + 1, . . . , m + T m . The underlying model for these new observations-online data-is assumed to take an analogous form where the underlying regression functional form remains the same and β i ∈ R p . For the parameter vectors {β i } m+T m i=m+1 in (2), it is either assumed that their true (unknown) values are all equal to β 0 (thus, there is no changepoint present in the overall combined model (1) and (2) The error terms {ε i } 1≤i≤m+T m from the overall model (1) and (2) are assumed to be independent and, moreover, they all follow the same distribution. A generic random error term from the underlying distribution is denoted as ε. The idea is to use the historical data to estimate the unknown parameter vector β ∈ R p . Later, the online data-starting from the observation index i = m + 1-are measured in real-time while asking a question for each new observation i ≥ m + 1 whether the underlying model remains unchanged (i.e., β i = β 0 ) or there is some change detected in terms of the unknown parameter vectors β i ∈ R p . If there is no changepoint detected for the given i then all available observations are used in the next step to ask the same question regarding the new-most recent observation. The whole changepoint detection process stops at the first observation i ∈ {m + 1, . . . , m + T m } for which there is a statistical evidence that β i = β 0 .
From a formal theoretical point view, at the first step, the historical data {(Y i , X i ) : i = 1, . . . , m} are used to obtain a conditional expectile estimator for the unknown parameter vector β ∈ R p . In particular, for a given expectile index τ ∈ (0, 1) the expectile function is defined as and the corresponding expectile estimator of the unknown (true) parameter vector β 0 ∈ R p from the model in (1) is defined as where β m = β m1 , . . . , β mp ∈ R p . It is straightforward to verify that for τ = 1/2 the expectile estimate β m defined by (4) reduces to a standard (nonlinear) least squares (LS) estimator of β 0 ∈ R p . In general, the τ th expectile of the given distribution can be interpreted as a hypothetical mean of some other distribution that would be obtained if the values above the expectile in the original distribution would occur τ 1−τ times more frequently. Thus, the choice of τ ∈ (0, 1) can be also seen in terms of some "exploratory" approach that somehow "balances" the distribution towards the (zero) mean and it provides a useful information about the skewness and possible outly-ing/extreme observations. Also note, that depending on the choice of the regression function f , the minimization problem in (4) may or may not be a convex problem. This restricts the choice of the algorithm used to obtain the final solution. For numerical issues and different techniques for fitting nonlinear models we refer to Chambers (1973). Computational aspects are further discussion in Sect. 4.
In the second step, the expectile estimator β m obtained from the historical data {(Y i , X i ) ; i = 1, . . . , m} is used to perform a real-time changepoint detection in the online data {(Y i , X i ) ; i = m + 1, . . . , m + T m } in terms of a formal statistical test of the null hypothesis against the alternative hypothesis where β 0 = β 1 . The proposed test statistic, sensitive to the null hypothesis, is defined as for a standard supremum norm · ∞ , a regularization function z(m, k, γ ) ≡ m 1/2 (1+ k/m)(k/(k + m)) γ for some γ ∈ [0, 1/2), and where In addition, ∇ f (X i , β m ) stands for a p-dimensional vector of the first partial derivatives ∂ ∂β f (X i , β m ) evaluated at the expectile estimate β m , and where J −1/2 m ( β m ) in (8) denotes the inverse of the square root matrix (in a sense of the Cholesky factorization) of J m ( β m ). A formal decision with respect to the null hypoth-esis in (5) is done by comparing the test statistic in (7) with the corresponding quantile of the limit distribution, which is a functional of a Wiener process (see Theorem 2). Details regarding the behaviour of the test statistic under the null hypothesis and the alternative hypothesis are derived in the next section.

Remark 1
In practical applications, the theoretical quantity Var [g τ (ε)] in (9) is typically unknown. However, the corresponding finite sample counterpart 2 can be used instead as a plug-in estimate, whereτ ∈ (0, 1)

Model assumptions
Considering the overall changepoint model in (1) and (2), the theoretical results formulated in this section rely on the set of assumptions stated below. For a better organization of the whole paper, the assumptions are split into five groups, (A)-(E). ASSUMPTION (A): (A1) The parameter space ⊆ R p is a compact set and the design space Υ ⊆ R q is assumed to be bounded; The density function of the random error terms {ε i } m+T m i=1 (the generic error term ε respectively) is continuous and strictly positive in zero.
are independent and identically dis- (B), and (C) are common conditions needed to show a strong consistency of the conditional expectile estimate β m defined in (4). Analogous conditions are used, for instance, by Choi et al (2003). Similarly, Assumption (D) is quite standard for the expectile models (e.g., Gu and Zou (2016), Kim and Lee (2016), or Ciuperca (2022)).

Asymptotic behaviour of the expectile estimator
In order to study the asymptotic behaviour of the expectile estimator β m defined in (4) let us consider the p-square matrix In addition to Assumption (A2), it is also required to impose slightly stricter assumptions on the matrix of the second partial derivatives ∇ 2 f (x, β). ASSUMPTION (E): The elements of ∇ 2 f (x, β) are all bounded for any x ∈ Υ and for β from a neighborhood of β 0 of radius of the order m −1/2 .
The assumption above is a common property which is-under Assumption (A1)satisfied by any function f which is continuous on ϒ × . It is considered, for instance, for a sequential test in a nonlinear changepoint model in Ciuperca (2013) where an ordinary least squares (LS) estimation framework was used instead. For the expectile estimation framework proposed in this paper, the asymptotic behaviour of the estimator in (4) is formulated in the next proposition.

Proposition 1 Under Assumptions (A)-(E),
If the regression function f is linear in β ∈ , then the asymptotic behaviour in the proposition reduces to a special case of Proposition 1 from Ciuperca (2022). Similarly, if the regression function f in (1) is nonlinear in β ∈ , but the random error terms follow some normal distribution N (0, σ 2 ) with σ 2 < ∞, the asymptotic behaviour in Proposition 1 gives the results of Theorem 2.1 in Seber and Wild (2003).

Test statistic under H 0 and H 1
The asymptotic behaviour of the test statistic defined in (7) is investigated in this section under both-the null hypothesis in (5) and the alternative hypothesis in (6). Note that that the vectors of parameters (9). Considering the size m ∈ N for the historical data and the size T m ∈ N for the online data there are two specific possibilities which should be considered separately.
• if lim m→∞ T m /m = ∞ for either T m = ∞ or T m < ∞, then such a scenario is called an open-end procedure; , then such a scenario is called a closed-end procedure.
By a common convention, it is usually assumed that for the open-end procedures it holds that T = ∞. The test statistic in Theorem 2 is based on the expectile estimator β m of the true parameter vector β 0 ∈ calculated from the historical data. However, the limit process is the same as for the expectile estimator in the linear model considered in Ciuperca (2022), or the quantile estimator proposed in Zhou et al (2015). On the other hand, the test statistic is different from that proposed by Ciuperca (2013) or Horváth et al (2004) where the authors rather considered the CUSUM type statistic based on the least squares residuals of the linear model or the nonlinear model respectively.

Theorem 2 Let Assumptions (A)-(E) be satisfied. Then, under H
In addition, the asymptotic behaviour of the test statistic under the null hypothesis in Theorem 2 does not depend on the underlying form of the nonlinear regression function f nor the true value β 0 as was the case for the test statistic applied for the parametric nonlinear model proposed in Ciuperca (2013). Therefore, the test statistic in Theorem 2 generally less restrictive, it is easier to use, and more straightforward to apply also for the least squares estimation (i.e., when τ = 1/2).
For the behaviour of the test statistic under the alternative hypothesis, more caution is needed. The model in (1) changes after the historical data and this change must be identifiable. Consequently, some reasonable assumptions are needed for the difference between the true parameter vectors β 0 and β 1 and, also, the underlying regression function f . Specific details are formulated in the next theorem.

Theorem 3 Let Assumptions (A)-(E) be satisfied and let m
Considering the assertions of both theorems together, the statistical test based on the proposed test statistics in (7) is proved to be consistent. The decision rule can be defined directly by considering the corresponding quantiles of the limit process from Theorem 2.
On the basis of the results obtained above one can define a stopping time-i.e., the first observation for which the null hypothesis in (5) is rejected in favor of the alternative hypothesis-considering the significance level α ∈ (0, 1). The corresponding changepoint estimate is defined as Note, that k m is the corresponding index referring to the online data only (i.e., k m ∈ {1, . . . , T m }). Thus, from the overall point of view, the underlying model changes after m + k m observations. It holds that lim m→∞ P[ k m < ∞ | H 0 true] = α and, similarly, lim m→∞ P[ k m < ∞ | H 1 true] = 1. Hence, the proposed test is consistent.

Empirical study
Finite sample properties of the proposed real-time changepoint detection method based on the expectile estimator defined in (4) are closely investigated in this section. Firstly, the empirical level of the test is assessed under various settings and the empirical power of the test is investigated for various changepoint scenarios. In the second part, the proposed methodology is also applied to analyze the Covid-19 prevalence data from Prague, Czech Republic, in order to link some authorities' decisions to the real-time pandemic situation.

Simulation experiment
The main concept of the simulation study is analogous to that presented in Choi et al (2003). However, instead of a simple exponential function used for the underlying Table 1 Simulation results under the null hypothesis (with the theoretical value of τ = 0.5 for the symmetric distributions and the empirical estimateτ = 0.0719 in terms of Remark 1 for the asymmetric distribution) is employed, where β 0 = (β 1 , β 2 ) ≡ (10, 5) and x ∈ (0, 1). The reason is that the function used in Choi et al (2003) becomes very insensitive to any parameter change for large x t = t (even for t ≥ 10). A simple iterative grid search algorithm is implemented to solve (4) and the changepoint test is performed in terms of Theorem 2. For the length of the historical period there are three different options considered (m ∈ {20, 50, 200}). Analogously as in Choi et al (2003), three error distributions are used: a symmetric standard normal distribution (with τ = 0.5), asymmetric normal distribution with the mean and variance being equal to one (τ = 0.0719), and, finally, a heavy-tailed (symmetric) Laplace distribution with the zero mean and unit variance (again, τ = 0.5 due to the symmetric property). In order to mimic both situations-the closedend scenario and the open-end scenario-there are again tree options considered for T m ∈ {10, m/2, m log m}. The empirical results under the null hypothesis (of no change in the model) are summarized in Table 1 and in Fig. 1. Different values for the regularization parameter γ ∈ [0, 1/2) were considered as well but no substantial differences were found, therefore, all reported results are for γ = 0.1 only. The empirical level of the test seems to properly keep the nominal level of α = 0.05 for all considered scenarios. The results are slightly conservative for the symmetric distributions (the normal distribution N (0, 1) and the double exponential distribution L(0, 1)). On the other hand, a slightly underestimated nominal level is observed for the asymmetric error distribution (the normal distribution N (1, 1)) but the actual differences are rather negligible. The corresponding expectile estimates of the unknown (true) parameters β 1 = 10 and β 2 = 5 seem both to be consistent for all considered scenarios and no inconsistences are observed in Table 1. On the other hand, the situation under the alternative hypothesis becomes slightly more comprehensive as there might be many different changepoint scenarios to possibly consider and take into account. For brevity purposes, there are only the results for one representative situation provided in this manuscript, but,any other situations were considered and compared with rather analogous results among all.
In particular, the following simulation scenarios under the alternative hypothesis were considered: • A change occurs either in β 1 , or in β 2 , or in both elements of β = (β 1 , β 2 ) simultaneously; • A change occurs immediately after the historical data or the changepoint occurs after the first half of the online data; • The magnitude of the change is relatively small compared to the true parameter values (20% change with respect to the true value) or the change is relatively large (the parameter(s) after the changepoint is(are) doubled); • Finally, if the changepoint occurs in both elements of β = (β 1 , β 2 ) , the corresponding effects of the changes may act against each other-thus, the resulting regression function after the change is very similar to the regression function before the change-or, alternatively, the effects of the changes aim at the same direction-thus, the regression function after the change is quite different from the underlying regression function before the change and there is also more power in the data to reveal such change.
All these situations have, of course, an important impact on the simulation results and, in particular, the performance of the proposed test in terms of its empirical power. For illustration purpose, one particular scheme (with the changepoint in β 2 only and the change magnitude being equal to the true value of β 2 ) is reported in Table 2. It is obvious from the table, that the performance of the proposed test (in terms of the empirical power) mostly depends on the true changepoint location and the length of the online data but in all considered situations the proposed test seems to be consistent.
Note that for the situations where the changepoint occurs in the first half of the online data (the rows denoted as k (1) m in Table 2), there are some false rejections (roughly 5%) of the observed rejections in the first half of the online data before the actual change appears. Such false rejection are not considered in Table 2 and only the rejections after the first half of the online data are reported. This is also reflected by The change in β 2 occurs either in the first half of the online data for k (1) m = T m /2 +1, or the change occurs at the very beginning of the online data, thus k (2) m = 1. The empirical powers for different T m , different error distributions, and three sizes of the historical data are given in terms of the relative proportions (using τ = 0.5 for the symmetric distributions andτ = 0.0719 for the asymmetric distribution). Two changepoint indicators are also used in the brackets: the average changepoint location index and the median changepoint location index. Values close to zero stand for an early changepoint detection (zero standing for the detection at the first available observation) and values close to one mean late changepoint discoveries (one standing for the detection at the last available observation) the fact that the average and median changepoint location indicators in the brackets are always greater than 0.50-which stands for the half of the online data sequence.
The average changepoint location indicator of, let us say, 0.25 indicates that the changepoint was estimated (when averaged over all simulations) after the first quarter of the online data. If the median location indicator (the second value in the brackets) is higher than the average, then the majority of the changepoint recoveries occurred after the first quarter, but there were also some relatively rear although very early recoveries (including also the very first online observation). On the other hand, for the median location indicator being smaller than the average indicator, the majority of  (10) is fitted on the historical data-the data before the restrictions release in December 1, 2020. The projection of the model is provided for the future in dashed red. The estimated saturation of K = 189 616 is visualized in doted red the changepoint recoveries occurred before the first quarter, but there were also some very late recoveries (including the very last observations).

Covid-19 prevalence
Relatively recently, the world society was very much effected by the Covid-19 pandemic, therefore, we tried to apply the proposed estimation and changepoint detection method for a nonlinear parametric population risk model-a three parameter Gompertz curve-to model the cumulative counts of the Covid-19 positive cases in Prague, the capital of the Czech Republic, over the period from the first positive case appearance (March 1, 2020) until the end of May 2021. The data, provided for academic purposes by the Institute of Health Information and Statistics of the Czech Republic are assumed to follow a typical nonlinear (growth) model in (1), where for the unknown parameter vector β = (β 1 , β 2 , K ) ∈ R 3 + . The univariate explanatory variables X i ≡ x i stand for the current day and the dependent random variables Y i in (1) reflect the cumulative Covid-19 positive cases at the given day. A similar population growth model-a five parameter logistic curve-was recently applied in Chen et al (2020) to predict the overall number of positive Covid-19 cases in the US. The resulting model, however, turned out to heavily underestimate the true number of positive cases, which could be also caused by the underlying distributional symmetry assumption.
In our approach, instead of trying to predict the overall positive cases, we pursue a slightly different goal: Firstly, the data are split into two parts-the historical data from the very first Covid-19 positive case in Prague until December 1, 2020 (when a rather populistic and quite much criticized government decision waved off some of the strict pandemic restrictions before Christmas) and the online data-arriving after December 1, 2020. Second, the proposed changepoint test is adopted to test whether the model before the government decision and the model after the government decision is the same, or not. Finally, the model can be also used to get some predictions of the overall Covid-19 positive cases over the overall follow-up period.
The data-daily positive cases-are visualized in Fig. 2a. The corresponding cumulative counts are given in the panel below- Fig. 2b. The Gompertz model from (10) is fitted on the historical data thus, the period from March 1, 2020 until December 1, 2020. The estimated parameters are provided in Table 3. The estimated number of the overall Covid-19 positive cases is K = 188 576, while the true number of all positive cases reported until May 26, 2021, is 184 959.
The proposed changepoint detection test based on (7) is performed to verify the stability of the model trained on the historical data, for m = 275, while new online data are arriving in a step-by-step manner (for T m = 176). The values for the test statistic in (7) at each step of the online testing regime are plotted in Fig. 3a. The null hypothesis of no changepoint in the vector parameter β = (β 1 , β 2 , K ) is rejected relatively Fig. 3 The test statistic profile for the online data in panel (a) and the first five days only for a more detailed insight in panel (b); The limit distribution from Theorem 2 with the corresponding 95% sample quantile c 0.95 (γ ) = 2.4260, for γ = 0.1, in panel (c); The model residuals from (10) with the corresponding density estimate, the empirical mean, and the empirical expectile for τ = 0.11, such that the empirical conterpart of E[g τ (ε)] equals to zero-all in panel (d); Finally, the residual autocorrelation and partial-autocorrelation plots in panels (e) and (f) respectively fast-just two days after the government reduced the restrictions-the corresponding test statistic is T (m) = 4.1618 for m = 275 and the corresponding 95% quantile of the limit distribution from Theorem 2 is c 0.95 (γ ) = 2.4260 for γ = 0.1. This may suggest that the actual change in the model occurred already before the online data-which can be also seen in Fig. 2-either from the first peak and the consecutive drop-off in panel (a) or some evident underestimation at the end of the historical data in panel (b). The estimated parameters for the retrained model after the changepoint detection are, for comparison, also reported in Table 3. Alternatively, one could also consider another set of the historical data (and maybe slightly more representative)from the very first case until the first culmination (i.e., the beginning of November 2020, thus m = 245) and to test whether the model changes significantly after the peak as the daily Covid-19 cases start to decrease. The estimated parameters are very similar ( β 1 = 88.15, β 2 = 0.0166, and K = 197264) but it takes 8 days for the proposed test statistic to detect a significant change in the model. Nevertheless, despite some obvious correlation among the model-based residuals (Fig. 3d and e) the estimated model seems to be relatively stable and the proposed changepoint detection test performs very well.

Conclusions
In this paper, we proposed the online procedure for testing stability of a nonlinear parametric regression model while taking into account the conditional expectile estimation framework. There are three main pivots behind the proposed methodology: Firstly, the nonlinear parametric form of the unknown regression function improves the overall flexibility of the model while the dependence on the unknown parameters still preserves a relatively simple and straightforward interpretation of the overall regression function estimate. Second, the expectile estimation method allows for some additional robustness especially with respect to asymmetric distributions. The estimation algorithm depends on the "asymmetry index" τ ∈ (0, 1), which is usually unknown, but it can be either anticipated from the data generating mechanism or some plug-in estimate can be used instead. Third, the online regime for the changepoint detection makes the proposed method instantly applicable, which may turn out to be convenient in situations when real-time decisions and model adaptations are required. Finally, given the underlying regression function, the whole minimization problem formulated in (4) does not have to be convex-therefore, we proposed a widely applicable general iterative grid search algorithm which can be effectively used in practical applications.
The proposed methodological framework enriches the class of online procedures for changepoint detections. To our best knowledge, the specific model setup considered in this paper has not been studied in the literature yet. The empirical performance is illustrated through an extensive simulation study. A practical applicability of the whole methodological framework is illustrated on a real data example concerning some of the most recent challenges related to online decision making-especially essential decisions related to the Covid-19 pandemics made by local and global authorities. expansion, we obtain By Assymption (C) we can define a p-square invertible matrix and using the relation in (A2) we obtain or, again, using Assumptions (C) and (D) to get which proves the given proposition.
In order to show the asymptotic behavior of the test statistic under the null hypothesis in (5) and the alternative hypothesis in (6) let us define a stochastic process for k = 1, . . . , T m , where u ∈ R p , such that u 2 ≤ C for some constant C < ∞. In addition, let for i = m + 1, . . . , m + T m , where the convergence rate of β m derived above is used. The following lemma is crucial for the proofs of the main theorems. (5) hold. Then, for any constants C 1 , C 2 > 0 and all k ∈ N large enough, there exists a constant C 3 > 0 such that P sup

Lemma 4 Let Assumptions (A1), (A2), (D), and (E) be satisfied and let the null hypothesis in
for m ∈ N sufficiently large.

Proof of Lemma 4
For any observation i ∈ {m + 1, . . . , m + T m }, any vector u = (u 1 , . . . , u p ) ∈ R p such that u 2 ≤ C 1 , we can express R i (u) as Both terms in (A5) will be studied separately. Let us start with B (1) Let us consider a random variable which also implies The relation in (A6) can be written as which holds with probability one. On the other hand, by Assumptions (A1) and (A2) we have that ∇ f (x, β) 2 is bounded for all x ∈ Υ and all β ∈ V m (β 0 , u), with V m (β 0 , u) = {β; β − β 0 2 ≤ m −1/2 u 2 }. For the right-hand side of (A8), using the fact that ∇ f (x, β) 2 is bounded, together with relation (A7), by applying the first order Taylor expansion to f (X i , β 0 + m −1/2 u), we have that there exists C > 0 such that we can write for the left-hand side of (A8) Using the relations in (A6), (A8), and (A9), together with the fact that ∇ f (x, β) 2 is bounded for all x ∈ Υ and β ∈ V m (β 0 , u) and applying the Hoeffding inequality, we obtain that for all t ∈ R and j = 1, . . . , p, where, for brevity, use used the notation Next, similarly as in the proof of Lemma 1 in Ciuperca (2022), under Assumptions (A1) and (A2), using the last relation above, we have that for all constants C 1 , C 4 > 0, there exists a constant C > 0 such that Next, we proceed by studying the random vector B (2) i (u) from the relation in (A5). Let us denote its j-th elemen, for j = 1, . . . , p, as for some constants θ ji ∈ [0, 1]. Under Assumption (D), it holds that Var [g τ (ε i )] < ∞ and also E[g 2 τ (ε i )] < C < ∞. Using these two relations, we obtain that for each j ∈ {1, . . . , p}, where the Cauchy-Schwarz inequality was applied in the last step. Moreover, using Assumption (E) we have that for all C 1 > 0 there exists a constant C 5 > 0 such that Therefore, we also obtain that For B (2) i (u), taking into account the relation in (A11) and the fact that B i j (u) is uniformly bounded by Cm −1/2 , we can use Lemma 4.1 of Ciuperca (2017) for δ k = k/m. Then, for k ∈ N being sufficiently large, for any constant C 6 > 0 and u 2 < C 1 , we have which further implies, similarly as for (A10), that Taking now C 6 = C 2 4 and C = 2 −1/2 (1 + C 2 4 )/C 4 we get Moreover, for any constant c > 0 and any two random vectors V 1 and V 2 of the same size, it holds that For the first inequality in (A14) we used the fact that for any constant c > 0 the event c ≤ V 1 + V 2 1 ≤ V 1 1 + V 2 1 implies, with probability one, also the random event . Taking now the relations in (A10), (A13) and (A14), and the constants C 4 = C 2 , C 3 = 2 −3/2 min( C, C), and c = pC 2 C 3 (k/m) 1/2 (log k) 1/2 , we obtain that for all C 2 > 0, which completes the proof of the lemma.
On the other hand, by the definition of the random process r m,k , we get, with probability one, that The last two relations imply +o P (km −1/2 ) + O P (m −1/2 k 1/2 (log k) 1/2 ).

(A16)
The rest of the proof follows the same lines as the proof of Theorem 1 in Ciuperca (2022) using the Komlós-Major-Tusnády (KMT) approximation for independent random vectors and Theorem 2.1 of Horváth et al (2004). Let us only sketch the main idea of the end of the proof. By the KMT approximation for independent random variables not identically distributed (see Götze and Zaitsev (2009)) for each component of the random vectors J −1/2 m (β 0 )∇ f (X i , β 0 )g τ (ε i ) 1≤i≤m and of J −1/2 m (β 0 )∇ f (X i , β 0 )g τ (ε i ) m+1≤i≤m+T m , we have that for all ν > 3 and m → ∞, there exists two Wiener processes W 1,m (t), t ∈ [0, ∞) and W 2,m (t), t ∈ [0, ∞) of the dimension p such that for the two terms of the right hand side of the relation in (A16) it holds We consider the open-end procedure case. Let us consider k m = k 0 m + m s , with s > 1. Since the function (x +1)(x/(1+x)) −γ is increasing in x > 0 for γ ∈ [0, 1/2), we have, as in the proof of Theorem 1, that there exists C > 0 such that, with probability converging to 1, when m → ∞.
It remains to study m+ k m i=m+k 0 m +1 ∇ f (X i , β m )g τ ε i + f (X i , β 1 ) − f (X i , β m ) or more precisely, taking into account the convergence rate of β m , we are going to study with u ∈ R p , u 2 < C. Consider the following sum